Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Reference-Free Validation of Short Read Data

Identifieur interne : 000777 ( Ncbi/Merge ); précédent : 000776; suivant : 000778

Reference-Free Validation of Short Read Data

Auteurs : Jan Schröder [Australie] ; James Bailey [Australie] ; Thomas Conway [Australie] ; Justin Zobel [Australie]

Source :

RBID : PMC:2943903

Descripteurs français

English descriptors

Abstract

Background

High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.

Results

We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others.

Conclusions

The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.


Url:
DOI: 10.1371/journal.pone.0012681
PubMed: 20877643
PubMed Central: 2943903

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:2943903

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Reference-Free Validation of Short Read Data</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20877643</idno>
<idno type="pmc">2943903</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943903</idno>
<idno type="RBID">PMC:2943903</idno>
<idno type="doi">10.1371/journal.pone.0012681</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">001061</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001061</idno>
<idno type="wicri:Area/Pmc/Curation">001061</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001061</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001352</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001352</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:20877643</idno>
<idno type="wicri:Area/PubMed/Corpus">001F34</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F34</idno>
<idno type="wicri:Area/PubMed/Curation">001F34</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001F34</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001E24</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001E24</idno>
<idno type="wicri:Area/Ncbi/Merge">000777</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Reference-Free Validation of Short Read Data</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Data Interpretation, Statistical</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>High-Throughput Nucleotide Sequencing (standards)</term>
<term>Reference Standards</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Sequence Analysis, DNA (standards)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Analyse de séquence d'ADN ()</term>
<term>Analyse de séquence d'ADN (normes)</term>
<term>Interprétation statistique de données</term>
<term>Normes de référence</term>
<term>Séquence nucléotidique</term>
<term>Séquençage nucléotidique à haut débit ()</term>
<term>Séquençage nucléotidique à haut débit (normes)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="normes" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
<keywords scheme="MESH" qualifier="standards" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Data Interpretation, Statistical</term>
<term>Reference Standards</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Interprétation statistique de données</term>
<term>Normes de référence</term>
<term>Séquence nucléotidique</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.</p>
</sec>
<sec>
<title>Results</title>
<p>We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of
<italic>k</italic>
-mers; and analysis of distributions of
<italic>k</italic>
-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanger, F" uniqKey="Sanger F">F Sanger</name>
</author>
<author>
<name sortKey="Nicklen, S" uniqKey="Nicklen S">S Nicklen</name>
</author>
<author>
<name sortKey="Coulson, Ar" uniqKey="Coulson A">AR Coulson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Bubnoff, A" uniqKey="Von Bubnoff A">A von Bubnoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Williams, Ba" uniqKey="Williams B">BA Williams</name>
</author>
<author>
<name sortKey="Mccue, K" uniqKey="Mccue K">K McCue</name>
</author>
<author>
<name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L Schaeffer</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Tian, G" uniqKey="Tian G">G Tian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Da" uniqKey="Wheeler D">DA Wheeler</name>
</author>
<author>
<name sortKey="Srinivasan, M" uniqKey="Srinivasan M">M Srinivasan</name>
</author>
<author>
<name sortKey="Egholm, M" uniqKey="Egholm M">M Egholm</name>
</author>
<author>
<name sortKey="Shen, Y" uniqKey="Shen Y">Y Shen</name>
</author>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hernandez, D" uniqKey="Hernandez D">D Hernandez</name>
</author>
<author>
<name sortKey="Francois, P" uniqKey="Francois P">P François</name>
</author>
<author>
<name sortKey="Farinelli, L" uniqKey="Farinelli L">L Farinelli</name>
</author>
<author>
<name sortKey=" Ster S, M" uniqKey=" Ster S M">M Østerås</name>
</author>
<author>
<name sortKey="Schrenzel, J" uniqKey="Schrenzel J">J Schrenzel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author>
<name sortKey="Schroder, H" uniqKey="Schroder H">H Schröder</name>
</author>
<author>
<name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author>
<name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dohm, Jc" uniqKey="Dohm J">JC Dohm</name>
</author>
<author>
<name sortKey="Lottaz, C" uniqKey="Lottaz C">C Lottaz</name>
</author>
<author>
<name sortKey="Borodina, T" uniqKey="Borodina T">T Borodina</name>
</author>
<author>
<name sortKey="Himmelbauer, H" uniqKey="Himmelbauer H">H Himmelbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harismendy, O" uniqKey="Harismendy O">O Harismendy</name>
</author>
<author>
<name sortKey="Ng, P" uniqKey="Ng P">P Ng</name>
</author>
<author>
<name sortKey="Strausberg, R" uniqKey="Strausberg R">R Strausberg</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Stockwell, T" uniqKey="Stockwell T">T Stockwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Erlich, Y" uniqKey="Erlich Y">Y Erlich</name>
</author>
<author>
<name sortKey="Mitra, Pp" uniqKey="Mitra P">PP Mitra</name>
</author>
<author>
<name sortKey="Delabastide, M" uniqKey="Delabastide M">M delaBastide</name>
</author>
<author>
<name sortKey="Mccombie, Wr" uniqKey="Mccombie W">WR McCombie</name>
</author>
<author>
<name sortKey="Hannon, Gj" uniqKey="Hannon G">GJ Hannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kircher, M" uniqKey="Kircher M">M Kircher</name>
</author>
<author>
<name sortKey="Stenzel, U" uniqKey="Stenzel U">U Stenzel</name>
</author>
<author>
<name sortKey="Kelso, J" uniqKey="Kelso J">J Kelso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rougemont, J" uniqKey="Rougemont J">J Rougemont</name>
</author>
<author>
<name sortKey="Amzallag, A" uniqKey="Amzallag A">A Amzallag</name>
</author>
<author>
<name sortKey="Iseli, C" uniqKey="Iseli C">C Iseli</name>
</author>
<author>
<name sortKey="Farinelli, L" uniqKey="Farinelli L">L Farinelli</name>
</author>
<author>
<name sortKey="Xenarios, I" uniqKey="Xenarios I">I Xenarios</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Brinza, D" uniqKey="Brinza D">D Brinza</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qu, W" uniqKey="Qu W">W Qu</name>
</author>
<author>
<name sortKey="Hashimoto, Si" uniqKey="Hashimoto S">Si Hashimoto</name>
</author>
<author>
<name sortKey="Morishita, S" uniqKey="Morishita S">S Morishita</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ewing, B" uniqKey="Ewing B">B Ewing</name>
</author>
<author>
<name sortKey="Green, P" uniqKey="Green P">P Green</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="20877643">
<pmc>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Reference-Free Validation of Short Read Data</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20877643</idno>
<idno type="pmc">2943903</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943903</idno>
<idno type="RBID">PMC:2943903</idno>
<idno type="doi">10.1371/journal.pone.0012681</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">001061</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001061</idno>
<idno type="wicri:Area/Pmc/Curation">001061</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001061</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001352</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001352</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Reference-Free Validation of Short Read Data</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
<affiliation wicri:level="4">
<nlm:aff id="aff1">
<addr-line>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2">
<addr-line>NICTA Victoria Research Laboratory, Parkville, Victoria, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>NICTA Victoria Research Laboratory, Parkville, Victoria</wicri:regionArea>
<wicri:noRegion>Victoria</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.</p>
</sec>
<sec>
<title>Results</title>
<p>We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of
<italic>k</italic>
-mers; and analysis of distributions of
<italic>k</italic>
-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanger, F" uniqKey="Sanger F">F Sanger</name>
</author>
<author>
<name sortKey="Nicklen, S" uniqKey="Nicklen S">S Nicklen</name>
</author>
<author>
<name sortKey="Coulson, Ar" uniqKey="Coulson A">AR Coulson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Bubnoff, A" uniqKey="Von Bubnoff A">A von Bubnoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Williams, Ba" uniqKey="Williams B">BA Williams</name>
</author>
<author>
<name sortKey="Mccue, K" uniqKey="Mccue K">K McCue</name>
</author>
<author>
<name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L Schaeffer</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Tian, G" uniqKey="Tian G">G Tian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Da" uniqKey="Wheeler D">DA Wheeler</name>
</author>
<author>
<name sortKey="Srinivasan, M" uniqKey="Srinivasan M">M Srinivasan</name>
</author>
<author>
<name sortKey="Egholm, M" uniqKey="Egholm M">M Egholm</name>
</author>
<author>
<name sortKey="Shen, Y" uniqKey="Shen Y">Y Shen</name>
</author>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hernandez, D" uniqKey="Hernandez D">D Hernandez</name>
</author>
<author>
<name sortKey="Francois, P" uniqKey="Francois P">P François</name>
</author>
<author>
<name sortKey="Farinelli, L" uniqKey="Farinelli L">L Farinelli</name>
</author>
<author>
<name sortKey=" Ster S, M" uniqKey=" Ster S M">M Østerås</name>
</author>
<author>
<name sortKey="Schrenzel, J" uniqKey="Schrenzel J">J Schrenzel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author>
<name sortKey="Schroder, H" uniqKey="Schroder H">H Schröder</name>
</author>
<author>
<name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author>
<name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dohm, Jc" uniqKey="Dohm J">JC Dohm</name>
</author>
<author>
<name sortKey="Lottaz, C" uniqKey="Lottaz C">C Lottaz</name>
</author>
<author>
<name sortKey="Borodina, T" uniqKey="Borodina T">T Borodina</name>
</author>
<author>
<name sortKey="Himmelbauer, H" uniqKey="Himmelbauer H">H Himmelbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harismendy, O" uniqKey="Harismendy O">O Harismendy</name>
</author>
<author>
<name sortKey="Ng, P" uniqKey="Ng P">P Ng</name>
</author>
<author>
<name sortKey="Strausberg, R" uniqKey="Strausberg R">R Strausberg</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Stockwell, T" uniqKey="Stockwell T">T Stockwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Erlich, Y" uniqKey="Erlich Y">Y Erlich</name>
</author>
<author>
<name sortKey="Mitra, Pp" uniqKey="Mitra P">PP Mitra</name>
</author>
<author>
<name sortKey="Delabastide, M" uniqKey="Delabastide M">M delaBastide</name>
</author>
<author>
<name sortKey="Mccombie, Wr" uniqKey="Mccombie W">WR McCombie</name>
</author>
<author>
<name sortKey="Hannon, Gj" uniqKey="Hannon G">GJ Hannon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kircher, M" uniqKey="Kircher M">M Kircher</name>
</author>
<author>
<name sortKey="Stenzel, U" uniqKey="Stenzel U">U Stenzel</name>
</author>
<author>
<name sortKey="Kelso, J" uniqKey="Kelso J">J Kelso</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rougemont, J" uniqKey="Rougemont J">J Rougemont</name>
</author>
<author>
<name sortKey="Amzallag, A" uniqKey="Amzallag A">A Amzallag</name>
</author>
<author>
<name sortKey="Iseli, C" uniqKey="Iseli C">C Iseli</name>
</author>
<author>
<name sortKey="Farinelli, L" uniqKey="Farinelli L">L Farinelli</name>
</author>
<author>
<name sortKey="Xenarios, I" uniqKey="Xenarios I">I Xenarios</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaisson, Mj" uniqKey="Chaisson M">MJ Chaisson</name>
</author>
<author>
<name sortKey="Brinza, D" uniqKey="Brinza D">D Brinza</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qu, W" uniqKey="Qu W">W Qu</name>
</author>
<author>
<name sortKey="Hashimoto, Si" uniqKey="Hashimoto S">Si Hashimoto</name>
</author>
<author>
<name sortKey="Morishita, S" uniqKey="Morishita S">S Morishita</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ewing, B" uniqKey="Ewing B">B Ewing</name>
</author>
<author>
<name sortKey="Green, P" uniqKey="Green P">P Green</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Reference-free validation of short read data.</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia. schroder@csse.unimelb.edu.au</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="RBID">pubmed:20877643</idno>
<idno type="pmid">20877643</idno>
<idno type="doi">10.1371/journal.pone.0012681</idno>
<idno type="wicri:Area/PubMed/Corpus">001F34</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001F34</idno>
<idno type="wicri:Area/PubMed/Curation">001F34</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001F34</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001E24</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001E24</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Reference-free validation of short read data.</title>
<author>
<name sortKey="Schroder, Jan" sort="Schroder, Jan" uniqKey="Schroder J" first="Jan" last="Schröder">Jan Schröder</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria, Australia. schroder@csse.unimelb.edu.au</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Victoria</wicri:regionArea>
<orgName type="university">Université de Melbourne</orgName>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bailey, James" sort="Bailey, James" uniqKey="Bailey J" first="James" last="Bailey">James Bailey</name>
</author>
<author>
<name sortKey="Conway, Thomas" sort="Conway, Thomas" uniqKey="Conway T" first="Thomas" last="Conway">Thomas Conway</name>
</author>
<author>
<name sortKey="Zobel, Justin" sort="Zobel, Justin" uniqKey="Zobel J" first="Justin" last="Zobel">Justin Zobel</name>
</author>
</analytic>
<series>
<title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Data Interpretation, Statistical</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>High-Throughput Nucleotide Sequencing (standards)</term>
<term>Reference Standards</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Sequence Analysis, DNA (standards)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Analyse de séquence d'ADN ()</term>
<term>Analyse de séquence d'ADN (normes)</term>
<term>Interprétation statistique de données</term>
<term>Normes de référence</term>
<term>Séquence nucléotidique</term>
<term>Séquençage nucléotidique à haut débit ()</term>
<term>Séquençage nucléotidique à haut débit (normes)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="normes" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
<keywords scheme="MESH" qualifier="standards" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Data Interpretation, Statistical</term>
<term>Reference Standards</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Analyse de séquence d'ADN</term>
<term>Interprétation statistique de données</term>
<term>Normes de référence</term>
<term>Séquence nucléotidique</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.</div>
</front>
</TEI>
</pubmed>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000777 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000777 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:2943903
   |texte=   Reference-Free Validation of Short Read Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:20877643" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021